Bootstrapping Large-scale Named Entities using URL-Text Hybrid Patterns

نویسندگان

  • Chao Zhang
  • Shiqi Zhao
  • Haifeng Wang
چکیده

Automatically mining named entities (NE) is an important but challenging task, pattern-based and bootstrapping strategy is the most widely accepted solution. In this paper, we propose a novel method for NE mining using web document titles. In addition to the traditional text patterns, we propose to use url-text hybrid patterns that introduce url criterion to better pinpoint high-quality NEs. We also design a multiclass collaborative learning mechanism in bootstrapping, in which different patterns and different classes work together to determine better patterns and NE instances. Experimental results show that the precision of NEs mined with the proposed method is 0.96 and 0.94 on Chinese and English corpora, respectively. Comparison result also shows that the proposed method significantly outperforms a representative method that mines NEs from large-scale query logs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Bootstrapping Approach for Geographic Named Entity Annotation

Geographic named entities can be classified into many subtypes that are useful for applications such as information extraction and question answering. In this paper, we present a bootstrapping algorithm for the task of geographic named entity annotation. In the initial stage, we annotate a raw corpus using seeds. From the initial annotation, boundary patterns are learned and applied to the corp...

متن کامل

بهبود شناسایی موجودیت‌های نامدار فارسی با استفاده از کسره اضافه

Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...

متن کامل

Self-Adjustable BootStrapping for Web-Scale Named Entity Extraction using N-grams

Named Entity Extraction refers to task of identifying and extracting mentions of names like person names, locations, time expressions, monetary values etc from text. There have different approaches to Named Entity extraction and classification based on supervised and semi-supervised learning. This paper describes a bootstrapping approach to extracing Named Entities for 150 categories from Wikip...

متن کامل

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

Named Entity Recognition in Persian Text using Deep Learning

Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013